[BugFix] Ensure num_cached_tokens is non-negative for kv transfer failed requests#37354
Conversation
…led requests For requests failing KV load in decode side, since it's still in WAITING_REMOTE_KV state, its num_cached_tokens are still the default -1, and it was never updated, when we do metrics logging on local_cache_hit, -1 will be used and will crash engine due to: ValueError: Counters can only be incremented by non-negative amounts.
There was a problem hiding this comment.
Code Review
This pull request addresses a ValueError that occurs when logging metrics for requests that fail KV transfer. The num_cached_tokens for such requests can remain at its default value of -1, which causes a crash when used to increment a counter. The fix applies max(0, ...) to ensure num_cached_tokens is always non-negative when creating the EngineCoreOutput for finished requests. This is a direct and effective solution to the problem. The change is correct and I have no further suggestions.
|
If you have a clear instructions for reproducing this issue, that would be helpful. Thanks |
OK I think this commit works as well, |
Use a100 80g to start vllm: multiple rounds of session testing: |
Purpose
BugFix for failure kv load requests handling
For requests failing KV load in decode side, since it's still in WAITING_REMOTE_KV state, its num_cached_tokens are still the default -1, and it was never updated, when we do metrics logging on local_cache_hit, -1 will be used and will crash engine due to:
ValueError: Counters can only be incremented by non-negative amounts.
Test Plan
Tested with stress testing and intentionally fail random kv transfer to surface the bug.
Test Result
Before:
logger will crash with ValueError: Counters can only be incremented by non-negative amounts.
After:
No crash observed
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.